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Abstract 

In this paper we prove asymptotic normality of the total length of 
external branches in Kingman's coalescent. The proof uses an embedded 
Markov chain, which can be described as follows: Take an urn with n black 
balls. Empty it in n steps according to the rule: In each step remove a 
randomly chosen pair of balls and replace it by one red ball. Finally 
remove the last remaining ball. Then the numbers Uk, < k < n, of 
red balls after k steps exhibit an unexpected property: ((To, • • • , U„) and 
(U n , ■ ■ ■ , Uo) are equal in distribution. 
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1 Introduction and results 

Our main result in this paper is that the total length L n of all external branches 
in Kingman's coalescent with n external branches is asymptotically normal for 
n — > oo. 

Kingman's coalescent (1982) consists of two components. First there are the 
coalescent times T\ > T 2 > ■ ■ ■ > T n — Q. They are such that 



are independent, exponential random variables with expectation 1. Second 
there are partitions 7Ti = {{1, . . . , n}}, 7r 2 , . . . , 7r„ = {{1}, . . . , {«-}} of the set 
{1, . . . , n}, where the set iTh containes k disjoint subsets of {1, . . . , n) and ttu-i 
evolves from TT k by merging two randomly chosen elements of 7Tfc. Moreover, 
(T„, . . . , Ti) and (7r„, . . . , 7Ti) arc independent. For convenience we put ttq := 0. 
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As is customary the coalescent can be represented by a tree with n leaves 
labelled from 1 to n. Each of these leaves corresponds to an external branch of 
the tree. The other node of the branch with label i is located at level 

p(i) := max{fc > 1 : {i} ^ ir^} 

within the coalescent. The length of this branch is T p u\, The total external 
length of the coalescent is given by 



P (i) ■ 

This quantity is of a certain statistical interest. Coalescent trees have been 
introduced by Kingman as a model for the gcncalogic relationship of n individ- 
uals, down to their most recent common ancestor. Mutations can be located 
everywhere on the branches. Then mutations on external branches affect only 
single individuals. This fact was used by Fu and Li (1993) in designing their 
D-statistic and providing a test whether or not data fit to Kingman's coalescent. 

Otherwise single external branches have mainly been studied in the litera- 
ture. The asymptotic distribution of T p u\ has been obtained by Caliebe et al 
(2007), using a representation of its Laplace transform due to Blum and Francois 
(2005). We address this issue in Section [5] below. Freund and Mohle (2009) in- 
vestigated the external branch length of the Bolthausen-Snitman coalescent, 
and Gnedin et al (2008) the A-coalescent. 

Here is our main result. 



Theorem 1. As n — > oo, 



l\[^(L n -2) 4 N(0,1) 
2 y log n 



The proof will show that the limiting normal distribution originates from 
the random partitions and not from the exponential waiting times. 

A second glance on this result reveals a peculiarity: The normalization of 
L n is carried out using its expectation, but only half of its variance. These two 
terms have been determined by Fu and Li (1993) (with a correction given by 
Durrett (2002)). They obtained 

E (i „) = 2, Va r (^)^ 8 ;^- lfa+ 2 ^ ^ 

(n — l)(n — I) n 

with h n := 1 + \ + • • • + — , the n-th harmonic number. Below we derive a more 
general result. 

To uncover this peculiarity we shall study the external lengths in more detail. 
First we look at the point processes r\ n on (0, oo), given by rj n = Yli=i ^\ATT p(i) ; 
i.e. 

r, n {B) := #{i : V^T p[i) e B} (1) 
for Borel sets B C (0, oo). 
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Theorem 2. As n — > oo the point process r\ n converges in distribution, as 
point processes on (0,oo], to a Poisson point process r\ on (0,oo) with intensity 
measure X(dx) = 8x~ 3 dx. 

We use (0,oo] in the statement of Theorem [5] instead of (0, 00) since it is 

stronger, including for example T) n (a, 00) A 77(0,00) for every a > 0. The 
significance is that, as n — >• 00, there will be points clustering at but not at 
00. (Below in the proof we recall the definition of convergence in distribution 
of point processes.) 

Theorem [2] permits a first orientation. Since y/nL n = Jxrj n (dx), one is 
tempted to resort to infinitely divisible distributions. However, the intensity 
measure X(dx) is slightly outside the range of the Levy-Chintchin formula. 
Shortly speaking this means that small points of r/ n have a dominant influ- 
ence on the distribution of L n and we are within the domain of the normal 
distribution. 

Thus let us look in more detail on the external lengths and focus on 
L a J := T p« ' < « < Z 3 < 1 ' 

which is the total length of those external branches having their internal nodes 
between level \n a ~\ and [n' 3 ] within the coalescent. Obviously L n = L^' 1 . 

Proposition 3. For < a < (3 < 1 

2 

n(n — 1) 



n(n — 1) 



and 

Var(/ - s( ;-n 

as n — > 00. 



In particular E^"*' 1 ) ~ E(L^), whereas Var^"*' 1 ) ~ eVar(L°' 1 ). 
Thus the proposition indicates that the systematic part of L n and its fluc- 
tuations arise in different regions of the coalescent tree, the former close to the 
leaves and the latter closer to the root. 

Still this proposition gives an inadequate impression. 

Theorem 4. For < a < /3 < 1/2 

P(L^=0) 1 

as n — > 00 . Moreover 

j i-oo 

\/nLn 2 -> y xr/(dx) 

and for 1/2 < a < /? < 1 

^-E(^) ^ AT(0,1). 
'Var(2#") 

addition L®'@ and L^' S are asymptotically independent for a < (3 < 7 < S. 
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This result implies Theorem [T] In L n = L n ' 2 + L,l ' the summands are of 

order \f\Jn and -y/logn/n, such that in the limit the second, asymptotically 

normal component dominates. To this end, however, n has to become exponen- 
ts i 

tially large, otherwise the few long branches, which make up L n ' 2 , cannot be 
neglected and may produce extraordinary large values of L n . Thus the normal 
approximation for the distribution of L n seems little useful for practical pur- 
poses. One expects a fat right tail compared to the normal distribution. Indeed 
x r}(dx) has finite mean but infinite variance. 

This is illustrated by the following two histograms from 10000 values of L n , 
where the length of the horizontal axis to the right indicates the range of the 
values. 



o, 1- 



n = 50 



-Th-rMT 




n = 1000 



The heavy tails to the right are clearly visible. Also very large outliers appear: 
For n = 50 the simulated values of L n range from 0.685 to 8.38, and for n = 1000 
from 1.57 to 7.87. 

Also it turns out that the approximation of the variance in Proposition [3] is 
good only for very large n. This can be seen already from the formula of Fu and 
Li. To get an exact formula for the variance we look at a somewhat different 
quantity, namely 

n 

L n ,f3 '■= y^A T p(i) A T L« a J _ T P(i) A T [nPl) 



with < a < P < 1, which is the portion of the external length between level 
[n a \ and ["-^J within the coalescent. 

Proposition 5. For < a < 1 with m := [n a \ 

n — m 



E(L^) = 2- 



1 



and 



Var(L^) 



a u _ 8(fc„_i - h m -i)(n + 2m - 2) 4(n - m)(4n + m - 5) 



(n - l)(n - 2) 



(n- l) 2 (n-2) 
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For a — we recover the formula of Fu and Li. A similar expression holds for 

n 

Proposition |31 and Theorem U carry over to L"'@ , up to a change in expecta- 
tion and with the limit y/nLn 2 A f£°(x — 2) r](dx). The following histogram 
from a random sample of length 10000 shows that already for n — 50 the dis- 

»A 1 

tribution of Ln ' fits well to the normal distribution when using the values for 
expectation and variance, given in Proposition [5] 



Our main tool for the proofs is a representation of L n by means of an imbed- 
ded Markov chain Uo, U%, . . . , U n) which is of interest of its own. We shall in- 
troduce it as an urn model. The relevant fact is that this model possesses an 
unexpected hidden symmetry, namely it is reversible in time. This is our second 
main result. For the proof we use another urn model, which allows reversal of 
time in a simple manner. 

The urn models are introduced and studied in Section [2j Proposition [3] is 
proven in Section [3l Theorems [2] and |4] are derived in Section |4] and Proposition 
[5] in Section [5] In Section |6] we complete the paper by considering the length of 
an external branch chosen at random. 



2 The urn models 

Take an urn with n black balls. Empty it in n steps according to the rule: In 
each step remove a randomly chosen pair of balls and replace it by one red ball. 
In the last step remove the last remaining ball. Let 

Uk '■= number of red balls in the urn after k steps . 

Obviously U — U n — 0, U\ — U n _\ = 1 and 1 < Uk < min(fc,n — k) for 
2 < k < n — 2. Uo, . . . , U n is a Markov chain with transition probabilities 

f (MV0> if«' = u-i, 

P(£4 +1 = v! | U k =u)=( u(n-k- u)/( n - k ) , if u' = u , 

{ ri~")/( n 2 fc ) > if«'=«+i. 

We begin our study of the model by calculating expectations and covariances. 

Proposition 6. For < k < I < n 

k{n~k) fc(fc-l)(n-Q(n-*-l) 
E(t/fe) -^l"' Cov (^)- („_i)2(„_ 2 ) ' 
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Proof. Imagine that the black balls are numbered from 1 to n. Let Zi k be the 
indicator variable of the event that the black ball with number i is not yet 
removed after k steps. Then Uk = n — k — Y^i=i %ik and consequently 

E(C/ fe ) =n-k- nE(Zi fc ) 

and for k < I in view of Zu < Z\k 

n n 

Cav(U k ,Ui) =Y / Y,Cav{Z ik ,Z j i) 
»=i j=i 

= n(n - l)E(Zi fc Z 2 i) + nE{Z u ) - n 2 ^{Z lk )^{Z 1 i) . 

Also 

prz (V) (»-fc)(n-fc-l) 

1 lfe J (5) '"("- 2 fe+1 ) n(n-l) 

and for fc < I 

( n f) rt 1 ) rr 1 ) (V) 



P(Zi fc = 1,Z 2 , = 1) 



(2) rn ( n 2 fe ) ( 

(n-k- l)(n - fc - 2)(n - Z)(n - Z - 1) 



2 / 



n(n-l) 2 (n-2) 

Our claim now follows by careful calculation. □ 

Note that these expressions for expectations and covariances are invariant under 
the transformation k ^ n — k, I ^ n — I. This is not by coincidence: 

Theorem 7. (U , Ui, . . . , U n ) and (U n , f n _i, . . . , U ) are equal in distribution. 

Proof. Leaving aside Uo — U n — we have Uk > 1 a.s. for the other values of 
fc. Instead we shall look at U' k = Uk — 1 for 1 < k < n — 1. It turns out that 
for this process one can specify a different dynamics, which is more lucid and 
amenable to reversing time. 

Consider the following alternative box scheme: There are two boxes A and 
B. At the beginning A contains n — 1 black balls whereas B is empty. The 
balls are converted in 2n — 2 steps into n — 1 red balls lying in B. Namely, in 
the steps number 1, 3, . . . , 2n — 3 a randomly drawn ball from A is shifted to B 
and in the steps number 2, 4, . . . , 2n — 2 a randomly chosen black ball (whether 
from A or £?) is recolored to a red ball. These 2n — 2 operations are carried out 
independently. 

For 1 < k < n - 1 let 

U' k := number of red balls in box A after 2fc — 1 steps, 

that is at the moment after the fcth move and before the fcth recoloring. Obvi- 
ously the sequence is a Markov chain, also U[ = 0. 
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As to the transition probabilities note that after 2k — 1 steps there arc n — k 
black balls in all and n — k — 1 balls in A. Thus given U' k — r there are r red 
and n — k — r — 1 black balls in A, and the remaining r + 1 black balls belong 
to B. Then U' k+1 = r + 1 occurs only, if in the next step the ball recolored 
from black to red belongs to A and subsequently the ball shifted from A to B 
is black. Thus 

T>(TT' — r 4- 1 I TT' — r\ — n-k-r-l n-k-r-2 _ (n-k-r-l\ i (n-k\ 
^\ U k+l - r + 1 I U k - r ) ^Tfe n-k-1 - \ 2 )/{ 2 ) ■ 

Similarly U' k+1 = r — 1 occurs, if the recolored ball belongs to B and next the 
ball shifted from A to B is red. The corresponding probability is 

P(U k+1 = r-l\U' k = r) = ^-^ = r+ 1 )/(" 2 " fe ) • 

Since U\ = 1 = U[ + 1 and in view of the transition probabilities of (Uk) and 
(U' k ) we see that {U\, . . . , f/ n -i) an d ({/{ + 1, . . . , ?7^_ 1 + 1) indeed coincide in 
distribution. 

Next note that U' n _ 1 = 0. Therefore XJ' k can be considered as a function not 
only of the first 2k— 1 but also of the last 2n— 2k— 1 shifting and recoloring steps. 
Since the steps are independent, the process backwards is equally easy to handle. 
Taking into account that backwards the order of moving and recoloring balls is 
interchanged, one may just repeat the calculations above to obtain reversibility. 

But this repetition can be avoided as well. Let us put our model more 
formally: Label the balls from 1 to n — 1 and write the state space as 

S := { ci), . . . , (L„-i, Cn-i)) | U e {A, B}, c t e {b, r}} , 

where Li is the location of ball i and its color. Then in our model the first 
and second coordinate are changed in turn from A to B and from b to r. This 
is done completely at random, starting within the first coordinates. Clearly we 
may interchange the role of the first and second coordinate. Thus our box model 
is equivalent to the following version: 

Again initially A contains n — 1 black balls whereas B is empty. Now in the 
steps number 1, 3, . . . , 2n — 3 a randomly chosen black ball is recolored to a red 
ball and in the steps number 2, 4, . . . , 2n — 2 a randomly drawn ball from A is 
shifted to B. Again these 2n — 2 operations are carried out independently. Here 
we consider 

U k := number of black balls in box B after 2k — 1 steps. 

Then from the observed symmetry it is clear that {U[ , . . . , f^-i) an d 
(U'{, . . . , t/^'-i) are equal m distribution. 

If we finally interchange both colors and boxes as well, then we arrive at the 
dynamics of the backward process. This finishes the proof. □ 

There is a variant of our proof, which makes the reversibility of (U' k ) manifest 
in a different manner. Let again the balls be labelled from 1 to n — 1. Denote 

v, m := instance between 1 and n — 1, when ball m is colored to red, 
<7 m := instance between 1 and n — 1, when ball m is shifted to box B. 
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Then from our construction it is clear that v = (v m ) and a = (a m ) are two 
independent random permutations of the numbers {1, . . . , n — 1}. Moreover, at 
instance k (i.e. after 2k — 1 steps) ball number to is red and belongs to box A, if 
it was colored before and shifted afterwards, i.e. v m < k < a m . Thus we obtain 
the formula 

U' k = #{1 < to < n - 1 : v m < k < a rn } (2) 
and we may conclude the following result. 

Corollary 8. Let v and a be two independent random permutations of 
{1, . . . , n — 1}. Then (U\, . . . , U n —i) is equal in distribution to the process 

(#{1 <m<n-l:v m <k< a m } + l) • 

Certainly this representation implies Theorem [7] again. Also it contains 
additional information. For example, it is immediate that Uk — 1 has a hyper- 
geometric distribution with parameters n— l,k — l,n— k — 1. 

The next example contains a first application of Theorem [7] to our original 
urn model. 

Example. Let us consider r„ = maxjfc > 1 : U n -k = k}, the number of red 
balls in the urn, after the last black ball has been removed. From reversibility 
t„ has the same distribution as the moment r' n — max{fc > 1 : = k}, before 
the first red ball is taken away from the urn. Thus 



P(t„ > k) = 
It follows for t > 



(V)(V) rr 2 )_(n-fc)...(n-2fc + l) 



C l 2 2 ) (™^2 +1 ) (n-1) •••(»»-*) 



as n — > oo. □ 

More generally the dynamics of our urn looks as follows: Clearly, if n is 
large, then in the beginning always two black balls are removed from the urn. 
The rare moments, when red balls are taken away, appear with increasing rate. 
Indeed it is not difficult to see that in the limit n — > oo and after a -^/n-scaling 
of time these instances build up a Poisson process with linearly increasing rate. 
As we have seen the picture remains the same after reversal of time. This will 
be made more precise in Section 2] 



We conclude this section by imbedding our urn model into the coalescent. Let 

Vfe := k - #{i : p(i) < k} , (3) 

and Uk := V n -ki < fc < n. Thus \% is the number of internal branches among 
the k branches after the (n — fc)-th coalescing event and Uk is the number of 
internal branches among the n — k branches after the fc-th coalescing event. The 
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coalescing mechanism takes two random branches and combines them into one 
internal branch. If we code the external branches by black balls and the internal 
branches by red, this completely conforms to our urn model; thus (Uo, . . . , U n ) is 
as above. By Theorem^ (Vb, • ■ • , V n ) has the same distribution as (Uo, • ■ • , U n ). 
In the next sections we make use of the Markov chain Vq, . . . ,V n and its prop- 
erties. 

Remark. For a different interpretation of the process (Uk), suppose that we 
have n — 1 pairs of (different) shoes, and that all left shoes are mixed in one 
pile and all right shoes in another. We sort the shoes by taking hrst a left shoe 
(at random), then a right shoe (also at random), then another left shoe, and so 
on. As soon as we take a shoe that matches one that we already have picked, 
we put away the pair; otherwise we put the shoe on the table in front of us. 
If the pairs are numbered and v m is the time right shoe m is picked, and a m 
the time left shoe m is picked, then right shoe m is on the table when the fc-th 
left shoe has been picked if and only if v m < k < a m , so by ©, the number 
of right shoes remaining on the table when the k-th left shoe has been picked 
is U' k , 1 < k < n — 1 . The number of left shoes remaining on the table at the 
same time is U' k + 1 = Uk, so the total number of shoes on the table is 2Uk — 1. 

This is a variation of the sock-sorting process studied in Steinsaltz (1999) 
and Janson (2009), Section 8, which is similar except that there is no difference 
between left and right; we obtain it if we mix all shoes in one pile and pick from 
it at random. (See Janson (2009) for other interpretations, including priority 
queues, and further references.) It is not surprising that we have the same 
asymptotical behaviour of Uk and max;,, Uk as for the sock-sorting problem. In 
particular, we mention the following Gaussian process limit result, cf. Theorem 
8.2 in Janson (2009). (This result is not used in the sequel.) 

Theorem 9. As n — > oo, the stochastic process n~ 1 / 2 (Uy nt \ — nt(l — t)) con- 
verges in D[0, 1] to a continuous Gaussian process Z(t) with mean E(Z(t)) = 
and covariance function 

B(Z(s)Z(t)) = s 2 {\ -t) 2 , 0<s<t<l. 

Sketch of proof. Note first that E(f7|_„ t j) = nt(l -t) + 0(1) by Proposition [6] 
It is easily seen that 

2 n — h — 2 

E(C4 +1 | U k ) = U k T U k + 1 = ^U k + 1 

n — k n — k 

and it follows that 

_ U k ~ E(y fc ) = U k k 

k ' (n-k){n-k-\) (n — k)(n — k — 1) (n - l)(n - k - 1) ' 

k = 0, 1, . . . , n — 2, is a martingale. 

Consider in the sequel only k < (1 — S)n for some fixed S > 0. Then 
Var(Mfe) < (n — k — l)~ 4 Var([4) = 0(n~ 3 ), and it follows from Doob's in- 
equality that 

max|f/ fc -E((7 fe )l =Op(n 1/2 ). 
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(Using Theorem [7] we see that this extends to < k < n.) A straight- 
forward computation of the conditional quadratic variation (M, M) m := 
J2 k<m E((M fc+1 - M k f | U k ) shows that, uniformly in < t < 1 - 6, 

which implies, see Theorem VIII. 3. 11 in Jacod and Shiryaev (1987), that 
n 3 / 2 M\ nt \ A Z(t) in D[0, 1 — 8], where Z{t) is a Gaussian martingale given 
by Z(t) = W(t 2 /(1 — t) 2 ) for a standard Brownian motion W(t). The result 
follows, for t £ [0, 1 - 5], with Z(t) = (1 - t) 2 Z{t). 

Since 5 > is arbitrary, this yields convergence in D[0, 1). By time-reversal 
and Theorem [71 we also have convergence in D(0, 1], and together these imply 
convergence in D[0, 1], see e.g. the proof in Janson (2009). □ 



3 Proof of Proposition [3] 

We use the representation 
where 

X k := #{i : p(i) = k} , 

1 < k < n. In view of the coalescing procedure X k takes only the values 0, 1, 2, 
and from the definition ([3]) of V k 

X k = l + V k - V k+1 . (4) 

From ((4)) , Vfc = U n - k and Proposition [6] we obtain after simple calculations 

„,„ , 2fc 2fc(n-fc-l)(n-3) ... 

E (X k ) = , Var (X k ) = — i 5 

rt — 1 (n — \y(n — 2) 

and for k < I 

Cav{X k ,X^- ( ^ n -j-\ . (6) 
[n — l) z (n — 2) 

Also from T k = Ej= fe +i(^-i - Tj) we have E(T fc ) = 2£™ =fc+1 and 
Var(T fe )=4E; =fc+lU 3i FF ;thus 

E(T fc ) = 2(i-i), Va r( T fe )<i, (7) 

for a suitable c > 0, independent of n. 
Thus from independence 



k nJ n — 1 
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Now the first claim follows by simple computation. 
Further from independence 

Var( (?fc - E(T fc ))X fc ) = £ Cov(T fc , T ; )E(X fc X ; ) . (8) 

n cv <fc<n' 3 n<*<k,l<nP 

Using ©-CO we have for k < I, 

c AM 

Cov{T k: T l )E(X k X l )=Var(T l )E(X k X l ) < Var(T})E(X fe )E(J5Q) < --. — , 

L A (n — \y 

and it follows that 

0< J2 Cov(T fc ,T,)E(Jr fc jr,)< £ ^(n-l)- 2 

n a <k<Knl 3 n°'<k<l<nl 3 

< 4c(n- I)" 2 = C^n" 1 ) . 

Consequently, (|8j) yields, using again ©-(J?]), 

Var( (T k -E(T k ))X k )= £ Var(T fe )E(X 2 ) + OipT 1 ) 

n a <~k<Cn 8 n a <.k<n@ 

v-^ 1 / 2k 4fc 2 \ ,. 

^ C E p(^t + (^i F ) +0 ^ 1 ) 

E p + 0(n- 1 ) = 0(n" 1 ). (9) 

n Q <fc<n' 3 

It remains to show that 

log n 



Var( £ E(T fe )X fc J ~8(/3-a)- 

n a </c<n' 3 



Now 



£ ECTfcJECTOCovCXfc.X,) 

n a <k<l<nP 



2 2 4fc v-^ Z-rn a l ^. 



and consequently 
Var( £ E(T fc )X, 



n Q <fe<nP 

= £ E(T fc ) 2 Var(J: fc ) + 0(n- 1 ) 

n»<Kn» 
n cv <fc<n' 3 

This gives our claim. 
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4 Proof of Theorems [2] and [4] 

In this section we use Theorem [7J Namely, Vq , . . . , V n is a Markov chain with 
transition probabilities, which can be expressed by means of X\, . . . as 
follows: 



( n - k 2 - v )/( n r) 



if x = , 



P{X k = x | V k = v) = {v{n - k - v)/( n - k ) , if x = 1 , 



We like to couple these random variables with suitable independent random 
variables taking values or 1. Note that Vk takes only values v < k, thus for 

k < n/3 



n — k — v 
2 



n — k\ ( n — 2k 
2 )-( 2 



n — k\ n — 3fc 
2 J ~ n-k 



Therefore we may enlarge our model by means of random variables Yfc, k < n/3, 
such that 



71 — 3k 



n—3k 



fe-1, 


...,Y1) 




if x 


= 0,|/ = 





if x 


= 0,|/ = 


1 


if x 


= 1,1/ = 


1 


if x 


= 2,y = 


1 



/n—k—v\ J (n-k\ _ 
V 2 // V 2 7 n-fc 

For P(Xfc = x\Vk = v) this gives the above formula, whereas 

[ n— 3 A; 



P(** = y I V fc = u, Vfc-i, . . . , Vo,n-i, • • • , y x ) 



n—k 
2k 
n—k 



if y = o , 

if y = 1 . 



This means that the 0/1-valued random variables Yk, k < rt/3, are independent. 
For convenience we put Yfc = for k > n/3. A straightforward computation 
gives 



E(Yfc - X k | Vfc = «) 
E((y fc - Xfc) 2 | Vi = «) 



< 



2(fc-w) 

n — k ' 

2(fc-^) , 

n — k (n — k)(n — k — 1) 

2(fc-w) 2fc(fc-l) 



2v(u- 1) 



(10) 



(11) 



n — k (n — k)(n — k — 1) 
for fc < n/3. Since fc — E(V4) = k(k — l)/(n — 1) from Proposition \6\ it follows 

Ak{k- 1) 



E((Yfc - Xfc.n < 



(n — k)(n — k — 1) 



(12) 
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Proof of Theorem^ Recall that, by Q and ©, 

n n— 1 

Vn = «W P (i) = Z Xfe< W fc • (13) 



i=l fc=l 



Recall also that r/„ A 77 as point processes on the interval (0, 00] means 

that J f drj n — > f f dr) for every continuous / with compact support in (0, 00], 

or equivalently rj n {B) A i](B) for every relatively compact Borel subset B of 
(0,oo] such that r](dB) = a.s. (Here B is relatively compact, if B C [J, 00] for 
some 5 > 0.) See, for example, the Appendix in Janson and Spencer (2007) and 
Chapter 16 (in particular Theorem 16.16) in Kallenberg (2002). 
Let us first look at the point process 



For < a < b < 00 



Vn ■= Z Y ^2V^/k ■ ( 14 ) 
fe=l 

ri([a,b))= J2 Y * 



and 



E«([o,6)))= £ ^fc^ 4 ^ 2 -^ 2 

thus we obtain from standard results on sums of independent 0/1- valued ran- 
dom variables that 77^ ([a, b)) has asymptotically a Poisson distribution. Also 
r]' n (Bi), . . . , T}' n {Bi) are independent for disjoint Si, ... , Bi. Therefore we ob- 
tain from standard results on point processes (for example Kallenberg (2002), 
Proposition 16.17) weak convergence of rj' n to the Poisson point process 77 on 
(0, 00] with intensity 8x -3 dx. 

Next we prove that for all < a < b < 00 

nn {[a,b))-r{ n {[a,b))^Q 

in probability. To this end note that from (fT"2)) 

E[ ]T (Y k -X k f]=0(n- 1 / 2 ), 



k< 



which implies that ~P(Xk = Yk for all k < 2i£E) — > 1. Therefore we may well 
replace Y" fe by X fc in rf n ([a, b)). 

Also, by O, V^T fe - 2^/fc = ^T fe - VnE(T fe ) - 2/vn. From and 
Doob's inequality for any e > 



P( max V^|T fe - E(T fc )| > e) < —Var{T [n2/5] ) = 0(n 

k>n 2 / 5 S 
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Since P(Y k = for all k < n 2 / 5 ) — > 1, we may as well also replace 

by -JnTk in r]' n , which yields r\ n by (H"3|) and p4|) (use for example Kallenberg 

(2002), Theorem 16.16). Thus the proof of Theorem[5]is complete. □ 

Proof of Theorem^ As to the first claim of Theorem 0] observe that the events 
{Ly = 0} = {X k = for all k < n 13 } and {Vjw»-| = \n^} are equal. Thus 

V(L a J > 0) < P(L°/ > 0) = P(^l - V rnfl1 > 1) 

r^Kr^i-i) 



<E(r^i-y M ) 



71—1 



For /3 < 1/2 this quantity converges to zero, which gives the first claim of the 
theorem. 

For the next claim we use that because of ([7]) ^/nTr n i/2-i has expectation 
2+0(n- 1 / 2 ) and variance of order n" 1 / 2 . ThusP(2-E < V^T r „i /21 < 2+e) -> 1 
for all e > 0. This implies that the probability of the event 

/ xr] n (dx) = ^^T k X k I { ^ Tk y 2+e} 

J[2+e,oo) k=1 



o,i 



< V"Xl TfcXfe/ {v^T fe >2-e} = / X?7„(dx 
fc=l J[2— e,oo) 



goes to 1. Also for a > from Theorem [5] xr] n (dx) —> f°° xrj(dx) in 
distribution. Altogether we obtain, letting e — >• 0, 

i f°° 
\friLn' 2 — > y xr)(dx) , 

which is our second claim. 

As to the last claim of Theorem [4] we note that from (J5J) 

^ E(T fe )X fe + Op(n- x / 2 ) (15) 

in probability, and also in L . In this representation we like to replace X k by 
Yfc. We assume first (3 < 1. Note that for /3 < 1 in view of and 

Var ( ^ E(T fc ) (Y fc - X k - E(Y k - X k \ V k )j) 



n a <k<nf 

^ E ^E((F fc -X fc ) 2 )=cV- 2 ) 
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and from (|TU)) , (J7J and Proposition [5] 

Var( ]T E(T fe )E(r fe -X fe | V k )) =Var( £ E ( Tfc )^fc 

n a <k<nP n a <k<nf 

s- o /i E(T fe )E(r ; ) 

n a </c</<n^ 

- 6l 2^ l ( n -k){n~lf{n-2)~ U{ >' 

n«<k<l<nf> V A ; V ^ 

Thus £„„< fc<B/s E(T fc )(r fe - X fe ) - E(Y fe - X fc )) = Op( ? i- 1 /2) and ^ yieYds 
L a J - E(L^) = V{T k ){Y k E(Y k )) + Op^ 1 / 2 ) . 

Also Var(i£n<*<fc<n0 y fe) ^ n ~ 2 E„«<fc<nf 2k/(n-k) = O^ 1 ), and because 
of ([7]) we end up with 

L^-E(L^)=2 £ yfc ~f yfe) +Qp(n- 1/2 )- (16) 

n a <k<n@ 

This is a representation of the external length by a sum of independent random 
variables. 

Now Var(Y fe ) = ^ - j^p, thus for /3 < 1 

Y - e(iv 



fe(n — k) (n— k) 



Var(2 £ iiZ|™)_4 £ 

n a <k<nP n a <k<n 

~8(/3-a)^ . 

Moreover for <5 > we have E(|Y fc - E(F fe )| 2 + 5 ) < ^ + (^) 2+<5 < ^ for 
fc < u/3, thus 

S ^E(|y fc -E(n)|^)<4 £ ^!^<^_1^. 

Thus for a > 1/2 we get 

n a <k<n,P 

and we may use Lyapunov's criterion for the central limit theorem. Conse- 
quently, (|T6|) implies 

f^-*W) 4iV(0,l) 

•\/8(/3 — a) log n/n 

This finishes the proof in the case /3 < 1, using Proposition [3] 

The case /? = 1 then follows from L"' 1 = L™' 1_e +L„ e > using Proposition^ 
The last claim on asymptotic independence follows from (1161) . too. □ 
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5 Proof of Proposition [5] 



Let < a < 1 and m = \n a \ . Since k — Vk — #{i : p(i) < k} is the number of 
external branches, which are found between level k — 1 and k 7 

t% 1 = E (T k -i-T k )(k-V k ) . 

m<k<n 

From independence 

PYf«^ V 2 fc ( fc -!) 

m<k<n v ' 

This gives the first claim. Next, letting 

k-V k 



m<k<n \2l 



E n :=E(L^\V ,...,V n ) = J2 
we have 

Var(L^) = Var(L^ 1 - E n ) + Var(£„) . 
Now, using Proposition [BJ 

Var^ 1 -E n )= £ E((r fc _ 1 -T,--i-) 2 )E((A ; -Vfe) 2 ) 

m<k<n \2/ 

^ 1 (k 2 {k-l) 2 t k(k - l)(n - k){n - k - 1) 

m<fc<n 



Q 2 V (n- l) 2 (n - l) 2 (n - 2) 



n — m ^ (n — fc)(n — fc— 1) 



and 



m<k,l<n \2J \2J 

(n-fc)(n-fc-l) ^ (n-Z)(n-Z-l) 



/c(fc-l)(n-l) 2 (n-2) ^ Z(Z- l)(ri- l) 2 (n-2) 

m<k<n y Jy ' y ' m<k<l<n K n ' y ' 

(n — k)(n — k — l) sr^ (I — m — l)(n — l)(n — I — 1) 

tTt _ 1 Vn - TW^ _ o\ + Z^ 



Thus 



1 " j (n-l) 2+ ^ fc(fc-l)(«-l) 2 ("-2) ■ 
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Now 



(k — m)(n — k)(n — k — 1) 

= (k - 1 - (m - 1)) (fc(fc - 1) - 2(n - l)fc + n(n - 1)) 
= k(k - if - (2n + m - 3)k(k - 1) 

+ (n + 2m - 2)(n - l)fc - ran(ti - 1), 



thus 



i/ \ / „. (k — m)(n — k)(n — k — 1) 
i(„-m)(n-2)+ £ 1 " 

= i(n — m)(n — 2) + i(n — m)(n + m — 1) — (n — m)(2n + m — 3) 

+ (ft n _i - ft m _i)(n + 2m - 2)(n - 1) - f — - -\mn(n - 1) 
= {h n -i — h m -i){n + 2m — 2){n — I) — \{n — m)(4n + m — 5) . 
Combining our formulas the result follows. □ 

6 The length of a random external branch 

Finally we look at the distribution of the length of an external branch chosen 
at random. Equivalently, letting p := p(l), we may consider 

Rn • Tp , 

the length of the branch ending in the leaf with label 1. Its asymptotic distri- 
bution can be obtained in an elementary manner and without recourse to the 
results of the preceding sections. Recall p := max{/c > 1 : {1} ^ 7Tfc}, thus 

Letting 

" 1 11 

(f) =2 t _ n) ' 

k=p+l \2J r 

{R' n > r} = {p < 2n/(nr + 2)} and for x > 

4 



P(n< > x) = P(p < 2n/(x + 2)) 



(.x + 2) 2 ' 

We show that this limiting result carries over to R n . From 

n n -. 

Rn-K= £ (r fe -i-T fe -— )=£(T fe _ 1 -T fe -— )/. 

fc=p+l V2/ fe=2 \2J 



{P<k} 
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it follows that 



'(Rn-K^ ]T (r fc -i-r fc --L)/ {p<fc} ) <p(p<V^) = o(i). 



v /n<fc<n 

Also from independence 



E [( ( Tfe -! 7^) / {p<fe}) 



= E E[(T fe _ 1 -T fe -^)^P(p<fc) 



v /n<fc<n 

i \ 2 



y / n<fc<n 



V^T<fe<ri \2) K2/ 

Consequently R n — R' n + o(n _1 ) in probability. Thus we end up with the 
following result, which was obtained by Caliebe et al (2007) by means of Laplace 
transform methods. 

Proposition 10. nR n converges in distribution to the law [i onR + with density 
H(dx) = 8(x + 2)- 3 dx. 
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